An investigation of the influence of indexing exhaustivity and term distributions on a document space

نویسندگان

  • Dietmar Wolfram
  • Jin Zhang
چکیده

The authors investigate the influence of index term distributions, and indexing exhaustivity levels on the document space within a visual information retrieval environment called DARE. Using combinations of three levels of term distributions (shallow, observed, steep) and indexing exhaustivity (low, observed, high), hypothetical document sets were generated and projected onto the DARE environment. The results from the simulated document sets demonstrate the importance of term distribution and exhaustivity characteristics on the density of document spaces and their implications for retrieval, particularly when different term weighting schemes are used. The results also demonstrate how different combinations of exhaustivity and term distributions may result in similar document space density characteristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexation relationnelle pour la recherche de documents structurés interreliés

In information retrieval on classical structured documents, one problem consists in browsing the result space using the structure of the documents. Taking into account other links between doxels increases this problem. In this article, we consider relative exhaustivity and relative specificity values computed on non compositional linked doxels to index the corpus ; adding this information to th...

متن کامل

An Analysis Method on Post-earthquake Traversability of Road Network Considering Building Collapse

This study aims at quantifying the influence on the traversability of road network of road network caused by building collapse in earthquake. To this end, an analysis method on post-earthquake traversability of road network considering building collapse is proposed. First, the time-history analysis of seismic response based on the multi-degree of freedom (MDOF) model is performed for regional b...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2002